Search CORE

42 research outputs found

Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

Author: Guan Wenhao
Hong Qingyang
Huang Hukai
Li Lin
Li Tao
Li Yishuang
Publication venue
Publication date: 11/07/2023
Field of study

With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-spectrograms. Specifically, a two-stage and a one-stage system are designed respectively, to improve the audio quality and the performance of style transfer. Secondly, a diffusion bridge of quantized VAE is designed to efficiently learn complex discrete style representations and improve the performance of style transfer. To have a better ability of style transfer, we introduce ControlVAE to improve the reconstruction quality and have good interpretability simultaneously. Experiments on LibriTTS dataset demonstrate that our method is more effective than baseline models.Comment: Accepted at Interspeech202

arXiv.org e-Print Archive

A Chunk-Based Reordering Model for Phrase-Based SMT Systems

Author: Chen Yidong
Hong Qingyang
Shi Xiaodong
Zhou Changle
周昌乐
陈毅东
Publication venue
Publication date: 01/01/2008
Field of study

This paper proposed a novel reordering model based on the reordering of source language chunks. This model is used as a preprocessing step of phrase-based translation models and could be well integrated with them. At the same time, as a chunk-based model, syntax information could be concerned in the process of reordering while the entire parsing of the source sentence is not required. Two experiments were carried out and the results showed that the proposed model could improve the performance of a phrase-based statistical machine translation (SMT) system greatly

Xiamen University Institutional Repository

Translation memory sharing models in XMCAT

Author: Chen Yidong
Hao Q
Hawryszhiewycz I
Hong Qingyang
Li Tangqiu
Lin Z
Maher ML
Shen W
Shi Xiaodong
Yang Y
Zhou Changle
陈毅东
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, two Translation Memory (TM) sharing models adopted in XMCAT, a Computer Assisted Translation tool (CAT) supporting cooperated work in machine translation, was described in detail. One is Center-based TM sharing model, which is only fit for users in a local area network (LAN) and the other is a novel model called P2P-based TM sharing model, which could be used through Internet by geographically distributed users. With the two TM sharing models, a user may share data with other users through network, so that he/she may reduce the repeated work further,and cooperate with others more easily. Besides, the methods used in XMCAT to deal with the problem of multi-translations arose in the cooperated memory sharing models, were also proposed in this paper. XMCAT system has been adopted and approved by some translation companies

Xiamen University Institutional Repository